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Abstract 

We show that unbounded fan-in boolean formulas of depth d -|- 1 and size s have average 
sensitivity 0(^ logs)'^. In particular, this gives a tight 2^01’^ ^ -i)) lower bound on the size of 
depth d-|-l formulas computing the parity function. These results strengthen the corresponding 
and 0(logs)‘^ bounds for circuits due to Hastad (1986) and Boppana (1997). Our proof 
technique studies a random process where the Switching Lemma is applied to formulas in an 
efficient manner. 


1 Introduction 

We consider boolean circuits with unbounded fan-in AND and OR gates and negations on inputs. 
Formulas are the class of tree-like circuits in which all gates have fan-out 1. Size of circuits (including 
formulas) is measured by the total number of gates. Depth is the maximum number of gates on an 
input-to-output path. 

Lower bounds against bounded-depth circuits were first proved in the 1980s PEIEIII], culminat¬ 
ing in a tight size-depth tradeoff for circuits computing the parity function. The technique, based 
on random restrictions, applies more generally to boolean functions with high average sensitivity. 

Theorem 1 (Hastad [Ij). Depth d+1 circuits computing parity have size 

Theorem 2 (Boppana [2]). Depth d + 1 cireuits of size s have average sensitivity O(logs)'^. 

In this paper, we prove stronger versions of these results for bounded-depth formulas: 

Theorem 3. Depth d+1 formulas eomputing parity have size . 

Theorem 4. Depth d+1 formulas of size s have average sensitivity 0(^ logs)^. 

Theorems [3] and 0] directly strengthen Theorems [1] and [2] in light of the following 

Fact 5. Every depth d+1 circuit of size s is equivalent to a depth d + 1 formula of size at most s'^. 

Theorems [H [21 [3l 0] are asymptotically tight, since parity is computable by depth d + 1 circuits 
(resp. formulas) of size (resp. 
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The main tool in the proof of Theorems [T] and [2] is Hastad’s Switching Lemma [1]. The Switching 
Lemma states that every small-width CNF or DNF simplifies, with high probability under a random 
restriction, to a small-depth decision tree. This yields lower bounds against bounded-depth circuits 
via a straightforward depth-reduction argument. In this paper we show how the Switching Lemma 
can be applied more efficiently to bounded-depth/ormulas, though in a less straightforward manner. 

In more detail: for independent uniformly distributed random a G {0,1}"' (“assignment”) 
and T € [0,1]" (“timestamp”), we consider the family of restrictions {I?p’^}o<p<i (i-e. functions 
[n] —)• (0,1, *} representing partial assignments to input variables xi,... ,Xn) where Rp’^ sets the 
variable Xi to cjj if r* < p and leaves Xi unset if r* > p. In the usual application of the Switching 
Lemma to circuits of depth d + 1, all subcircuits of depth k + 1 are hit with the restriction for 
a fixed sequence Pi > ■ ■ ■ > Pd (typically pk = In this paper we achieve sharper bounds 

against formulas by hitting each subformula with the restriction where the parameter q(‘h) 

(= q'^’"(<h)) is defined inductively, according to a random process indexed by subformulas of <I>. 
Our technical main theorem is a tail bound on q(‘h), viewed as a random variable determined by 
a and r. 

After preliminary definitions in ^ we state and prove our technical main theorem in ^ and 0 
As a corollaries, we derive Theorem [3] in ^and Theorem S] in ^ In ^we state a further corollary 
of our results on the relative power of formulas vs. circuits. 

2 Preliminaries 

N = {0,1, 2,... }. [n] = {!,..., nj. exp(A) = e^. 

2.1 Formulas 

A formula is a finite rooted tree whose leafs (“inputs”) are labeled by literals (i.e. variables Xi 
or negated variables —'Xi) and whose non-leafs (“gates”) are labeled by AND or OR. (Gates have 
unbounded fan-in.) Every formula $ computes a boolean function on the same set of variables. 

The size of a formula <1, denoted by |<h|, is the number of gates in !>. (Note that every lower 
bound on size is also a lower bound on leafsize, i.e., the number of leaves in a formula.) The depth 
of d> is the maximum number of gates on an input-to-output path. Formulas of depth 0 are literals; 
formulas of depth 1 are clauses (i.e. an AND or OR of literals). We are often interested in formulas 
of depth > 2 and speak of “depth d + 1” where d is an arbitrary positive integer. 

2.2 Boolean functions and restrictions 

A restriction is a function ^ : [n] —)• {0,1,*}, viewed as a partial assignment of boolean input 
variables xi,..., to 0, 1 or * (meaning “unset”). For a boolean function / : {0,1}" —>■ {0,1}, the 
restricted function f\g : {0,1}^ —>■ {0,1} is defined in the usual way. For p G [0,1], we write 

TZp for the distribution on restrictions g where P[ g{i) = *] = p and P[ g{i) = 0 ] = P[ g{i) = 1 ] = 
(1 — p)f 2 independently for all i G [n]. 

2.3 Average sensitivity and decision-tree depth 

The average sensitivity as(/) of a boolean function / is the expected number of input bits that, 
when flipped, change the output of /, starting with a random input assignment. 
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The decision-tree depth D(/) of / is the minimum depth of a decision tree which computes /; 
in particular, D(/) = 0 iff / is constant. Two elementary facts which we will use later (see [2]): for 
every boolean function /, 

(1) 3s(/) < D(/) (i.e. average sensitivity is at most decision-tree depth), 

(2) E [ as(/|'^)) ] = p-as(/) forallO<p<l. 

Hastad’s Switching Lemma relates random restrictions and decision-tree depth. We give a 
somewhat nonstandard statement (the usual statement is in terms of width-/c CNFs and width-.^ 
DNFs). 

Lemma 6 (Switching Lemma [1]). Let k,i € N. Suppose f is the AND or OR of an arbitrary 
family {/j} of boolean functions with D(/j) < k for all i. Then for all 0 < p < ^, 

P [Dif\g)>i]<{5pkY. 

Q'^l-Cp 


3 A random process associated with formulas 


Definition 7. Let a G {0,1}*^ (“assignment”) and r G [0,1]"' (“timestamp”) be independent 
uniformly distributed random variables. For 0 < p < 1, let Hp’^ : [n] ^ {0,1, *} be the restriction 


K’-(i) 


ai if Ti > p, 
* if Tj < p- 


We regard the family of restrictions {Rp'^}o<p<i as a stochastic process where the parameter p 
represents a “time” which starts at 1 and decreases to 0. At the initial time p = 1, the assignment 
a is fully masked (i.e. is all *’s). As p decreases, the values of a are gradually unmasked, until 
the final time p = 0 when a is fully revealed (i.e. Rff^ = <7)- Of course, for any fixed p, Rf'^ is 
simply a random restriction with distribution TZp. 

Definition 8 (Main Definition). For all formulas we define the “stopping time” q'^’"(<l>) G [0,1] 
by the following induction: 

• If has depth 0 (i.e. <1> is a variable or negated variable), then q'^’'^(<l>) := 1. 


• If $ is AND('I'i,..., Tm) or OR('I'i,..., T^), then 


q-’"(cf.) 


p^>^($) 

14.k<^u($) 


where p'^’”(<f>) := minq”'’”(Ti), k°'’'^($) := max{l, max D(Ti [RpJ]r($))}• 

For the sake of readability, we will suppress a and r whenever possible and simply write q(<h), 
p(<i>), k(<l>). However, the reader should keep in mind that these random variables are determined, 
for all formulas by a single pair of u of r. (We will continue to write a and r when referring to 
restrictions Rf'^.) 

We view q(‘h) as the stopping time for a stochastic process indexed by formulas For <i> of 
depth 0, q(<h) is the initial time 1 (when all variables are masked). For $ of depth > 1, q(<k) is 
defined in terms of two auxiliary parameters: 


3 



• p($) is the most advanced (i.e. minimum) stopping time q('h) among children 'h of <h. 

• k(<h) is the maximum decision-tree depth among children 'h of upon being hit with the 

restriction (For technical reasons, we set k($) = 1 in the event that Z)('k = 0 

for all 'h.) 

If <I> is an AND (resp. OR), then is a k(<I>)-CNF (resp. DNF). The choice of definition 

q($) = p(^>)/14-k(<h) allows us to apply the Switching Lemma to This is made precise 

by the following lemma. (Since the dependence on a and r is crucial here, we use explicit notation; 
q^>'r($), etc.) 

Lemma 9. Let ^ be a formula of depth > 1 and let q G Supp(q°'’'^(<!>)) (i.e. q = q°'’'^($) for some 
a € {0, 1}"' and r € [0, Then for all 0 < a < 1 and ^ € N, 


P 

cr,T 




q'"'^($) = q 


< 



I 


Proof. Fix and q as in the hypothesis of the lemma. Since <1> has depth > 1, it is the AND or 
OR of formulas Tj. Let 


s 9 = p/14/c and there exist a € {0,1}” and r G [0,1]” 
such that p'^’’’(<!>) = p, = g and k”'’'^(<I>) = k 

Note that I is nonempty and indexes a partition of the event {q”'’'^(<h) = q} into subevents 
{p'^’^($) = p, Rp''^ = Q and k°'’^(<I>) = k}. 

To prove the lemma, consider any [p, g, k) G I. Conditioning on this subevent, we can view 
Raq as the composition of g and an independent random restriction 6 ~ R-a/uk- Since is an 
AND or OR of functions of decision-tree depth < k, Lemma [6] implies 




P 

(7,T 


D(4.fRS’;)>£ 


= p, R^’’^ = g and k‘^’^(4>) = k 


= P 


D((cI>r^?)r0) 



□ 


4 Tail bound on q($) 


Our technical main theorem is a tail bound on the random variable q(4>) (= q'^’'^(4>)) where the 
randomness is over independent uniform a G {0,1}” and r G [0,1]”. We state the result first with 
asymptotic notation. 


Theorem 10. For every depth d+1 formula and 0 < A < 1, 


P [ q(cl>) < A ] 


| 4 >| 

“ exp(D(dA“^/'^) — 0(d)) 
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In order to have a useable induction hypothesis, we restate Theorem 1101 with explicit constants: 


Theorem 10 (more precisely). For every depth d+1 formula and i > 0, 


P 


q(^) < 


1 


Ud+1£ 




Cd 


where (7 = 1 + 


+ E 


exp(e 

1 


i=0 


exp(e®“^ “ (^ + exp((j + l)e®“i — (* + i + 2)e“2) 


7.83. 


Proof. We first note that the theorem is trivial if ^ < e*^ (as the RHS is > {C/ exp(e“^))'^ > 1 since 
C > exp(e“^)). Therefore, we assume that i > e'^. We argue by induction on d. 

Consider the base case d = 1 where <I> is a depth 2 formula. Note that q('I') = 1/14 for each 
depth 1 subformula 'h of <I>; hence p(‘h) = 1/14. Also, each 'I' is the AND or OR of decision-trees 
of depth 1; so by Lemma El 


P 




= P 

e~7^1/14 L 


D(^rp) > I 


<ii 


Since q(<h) = p(4>)/14-k($) = l/14^-k(<h), we have 

1 


P 


q(4>) < 


142£ J 


= P 


k(4>) > ^ 


< 


Cd 




exp(t') 


<|$|- 


p(e“2d£^/‘^) 


ex 


For the induction step, let d > 2 and assume the theorem holds for d — 1. Let 4> be a formula of 
depth d + 1. Let T range over depth-d subformulas of 4>. In particular, we have |<I>| = 1 + IT/ 
We will define a family of events denoted A and Bi (i € N) and Cj j {i, j € N) and show that the 
union of these events covers the event {q(4>) < We will then bound the probability of each 

of these events and show that the (infinite) sum of these probabilities is at most |T/ ^ 

For alH G N, define ki and ai by 


' exp(e“2ci£l/ii) ■ 


ki ;= 


di := 


kj 


1 


l)/d 


Events A and Bi and Cjj {i,j G N) are defined as follows: 


A 

Bi 

^i,j 


'J 

def 


(p(T) < ao), 

V (q(T) < a.+i) A > h), 

'I' 

V («i+i+i < q(^) ^ «i+i+2) A > kiY 

'I' 
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Claim; If q($) < , then A V \l[B^y\IC,, 

i=0 ^ j=0 

Proof of claim : Assume q(<I>) < and further assume that A does not hold. Clearly 

there exists a unique i G N such that ai < p(<h) < Oi+i (since ai is eventually > 1). Since 
q(<h) = p(<I))/14-k(<I>), we have k(<I>) > Q;jl4‘^.^ = ki. Note that h > ko = > 1 (using the 

assumption that £ > e'^). Since k(<I)) = max{l,max^ D(\I'it follows that there exists a di 
such that 0(111 > ki. 

Fix an arbitrary choice of \I' such that 0(41 > ki. There are two cases to consider: either 

q(4') < dj+i or cq+j+i < q(4i) < ai+j +2 for some j G N. 

• Assume q(4') < Oj+i. In this case, we have 0(4'< 0(41 since p(<h) < q(4'). 

Therefore, 0(41 > ki. We conclude that Bi holds. 

• Assume Oi+j+i < q(4i) < ai+j +2 for some j G N. We have D(4i|'i?p($)) < D(4'|'i?Q;^J since 

p(<h) < Oj+i. Therefore, 0(41 J > ki. We conclude that Cij holds. 

This concludes the proof of the claim. 

To complete the proof of the theorem, we will bound the probabilities of events A, Bi and Cij 
and take a union bound. We ignore the fact that all but finitely many of these events have zero 
probability, since P[ ] = 0 (resp. P[ Cij ] = 0) for all a* > 1 (resp. ccj+j+i > 1). Instead, we 
show that P[ Bi ] is exponentially decreasing in i, while Pr[ Cij ] is exponentially decreasing in j 
and doubly exponentially decreasing in i. 

We first bound the probability of A: 

P[-4^] = P [ V^(^) - i4de£(d-i)/d “ i4dQ£{d-i)/d 

If If 

(jd—l 

£l‘^lexp(e-2{d-l)eV(^-i)fi/^) (induction hypothesis) 
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We next bound the probability of Bi\ 

P[^,] = p [ V ^ «*+i) ^ \ 

< ^P [ qi^) < ai+i 1 P [ > h qi^) < a,+i 




ki 




(Lemma [ 9 ]) 


'I' 

exp(e*-i^Vd) ^ ^ - 


< 


< 


1 




l^dQ-i£(d-l)/d 

Cd-1 


exp(e® exp(e ^(d—l)e i)£i/'^) 

1 ,., c<^-^ 




exp(e* exp(e — ie 2 ^i/d^ 

1 ,., 


(induction hypothesis) 
(e-*/(rf-i) > 1 _ ^) 


.|cl,|. 


< 


exp((e*“^ “ (* + l)e“ 2 )£^/'^) exp{e~‘^d£^/‘^) 

1 .., 




exp(e® ^ — (i + l)e 2 ) exp(e“ 2 d£i/<i) 

The last inequality uses the assumption > 1 as well as the nonnegativity of e*“^ ~ (* + l)e“^ 
for all i G N. 

Finally, we bound the probability of Ctj: 


F[Cij] = p [ V (“*+^+1 < ^“*+^+ 2 ) A (D(^ri2a:;j > h 


< [ ql^) < 1 P [ > fe I ai+i+i < q('t) < Oi+j+2 

^ aj+j +2 (Lemma 


<5 


< ( °i+i/°-+J+i ) ’^p[q(4,)<a, 
-^^^p[q(M/)< 


1 


p((j + l)e*“^£^/'^) 


< 


< 


1 




l4dQ-{i+j+l)£{d-l)/d 

Cd-1 


exp((j + l)e* exp(e 2 (d—l)e (*+i+i)/('^ ^)i^/d'^ 

1 ,., 


(ind. hyp. 




exp((j + l)e* exp(e ^(d — — (i + j + l)e 2 £i/rf) 

1 ,., 




< 


exp(((j + l)e*“^ — (* + J + 2 )e“ 2 )£i/rf) exp(e“ 2 dt'i/'^) 

1 , , 




exp((j + l)e* ^ — {i+j + 2 )e 2 ) exp(e“ 2 d£V<i) 


The last inequality uses the assumption > 1 and the nonnegativity of (j + l)e* ^ — {i+j + 2 )e 
for all i,j G N. 


-2 
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We finish the proof by taking a union bound: 


P 


q($) < 


1 


l^d+l£ _ 


i=0 


<F[A] + Y,(n^3^]+Y.nc^,J]) 

j=0 


Cd 


p(e 


□ 


ex 


5 PARITY 

We use the results of the last section to prove our lower bound for the parity function. 
Theorem 3 (restated). Depth d + 1 formulas computing parity require size exp{D{d{n^^^ — 1))). 
Proof. Suppose is a depth d + 1 formula computing parity. Then 

P r is non-constant 1 = 1 — fl->1-. 

L J V n/ e 

On the other hand, by Theorem 1101 and Lemma [H 

P r is non-constant 1 = P T D(^> ) > 1 1 

cr,r L 

< P [ q($) < 1/n ] + P [ > 1 ] 

|ch| 


< 


1 

exp(Q(dn^/‘^) — 0(d)) e 


Therefore, 


|<h|> 1- 


exp 


n 


i/<i) _ 


It follows that there exist universal constants co,ci > 0 (determined by the constants in the Q{-) 
and 0(-)) such that |<I>| > exp(cod(n^/'^ — 1)) in the regime d < ci Inn. 

In the regime d > ci Inn, we have d(n^/'^ — I) = 0(lnn), more precisely. 

In n < d(n^/'^ — 1) < ci (e'^^ — I) In n. 

Note that d(n^/‘^ — 1) is decreasing in d and limrf_^oo d(n^/'^ — 1) = Inn. Invoking Khrapchenko’s n^ 
leafsize lower bound [5] (which implies a (gate)size lower bound of n), we get a tight lower bound 
of exp(n(d(n^/'^ — 1))) which is valid for all d and n. □ 


6 Average Sensitivity 

Theorem 4 (restated). Depth d+1 formulas of size s have average sensitivity ©(^Ins)*^. 

Proof. Let $ be a formula of depth d-|-1 and size s (recall that size is the number of gates). Assume 
as(<h) > 1, since otherwise the theorem is trivial. We further assume that has bottom fan-in < s; 
otherwise it is easily shown that as(<h) = 0(as(d>')) where d>' is obtained from <I> by replacing every 
bottom AND (resp. OR) gate with fan-in > s with 0 (resp. 1). In particular, <I> has leafsize < 
so it depends on < distinct variables. 







Letting p = l/as($) and using facts ([I]) and ([2]), we have 


1 =p.as($) = [as(ci>r£,) ] < E [ ] = E> M- 

p (T,T k = 

For all A: G N, by Theorem 1101 and Lemma [U 

P [ D(®ti?’T > * ] < p [ D(<f t/;SL(p,,(t))) £ * ] 

<P[q(<l.)<p]+P[D($t-Rj,;,)>«:] 

< s 1 

“ exp(n(d-as(<f>)^/'^) — 0{d)) e^ 

Combining these inequalities, we have 

exp(0((i-as(d>)^/'^) - 0{d)) < ^ ^ = 0{s^). 

It follows that n(d-as(<h)^/'^) < 31ns + 0{d) and therefore as(<I>) = 0(^ Ins)'^. □ 


7 Formulas vs. Circuits 

Our lower bound for PARITY (Theorem [3]) implies a separation between the power of depth d + 1 
formulas vs. circuits. We write {poly-size depth d + 1 circuits/formulas} for the non-uniform 
complexity class of languages computable by n'^^^^-size depth d-|- 1 circuits/formulas where d(n) is 
an arbitrary function of n. 

Corollary 11. For all d{n) = o(logn) with lim^^oo d(n) = oo, 

(3) {poly-size depth d-\-l formulas} [poly-size depth d-|- 1 circuits}. 

Moreover, for all d < C 1 ^^°f^g ^ (for some universal constant C > 0), 

(4) [poly-size depth d-\-l circuits} ^ {n°^^^-size depth d-\-\ formulas}. 

Separation © may be regarded as the depth d -|- 1 analogue of the conjectured separation 
{poly-size formulas} ^ {poly-size circuits}, also known as NC^ ^ P/poly. By Spira’s theorem [7], 
every poly-size formula is equivalent to a poly-size formula of depth O(logn); thus, extending ([3]) 
from depth o(logn) to depth O(logn) would imply NC^ 7 ^ P/poly (in fact NC^ 7 ^ AC^). 

For the smaller range of d < , we get the stronger separation ([3]) . In light of Fact El this 

is the strongest possible separation between formulas and circuits of the same depth. 

We remark that until recently not even the weak separation {3]) was known to hold for any 
super-constant d ^ 0(1)- The first progress on this question was made in [6], where Q was shown 
to hold for all d < log log log n via a lower bound for DiSTANCE-loglog n st-CONNECTIVITY. In 
fact, the lower bound of [6] implies a much stronger result: for all d < log log log ra, 

(5) {poly-size depth d -|- 1 circuits} ^ {n°^'^^-size depth formulas}. 

It remains an open problem to push separation (E]) to greater depths. 
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